Dataset: https://catalog.data.gov/dataset/meteorite-landings
This data set includes Meteorite Landing data from The Meteoritical Society and NASA, which includes 34,513 confirmed meteorite landings around the globe. The data was last updated November 2020.
In the data set, meteorites have multiple different fields: Name, ID, NameType, Class, Mass (g), Fall, Latitude, Longitude, and GeoLocation. Each meteorite has its own unique Name and ID. NameType has two factors: valid and relict, where relict meteorites, “were once meteorites but are now highly altered by weathering on Earth”. Class is the certain classification for different types of meteorites. Mass of meteorites is weighed in grams. Fall has two factors: found and fell, where fell meteorites were confirmed to have fallen but have not been found. Latitude and Longitude are both the coordinates for each meteorite.
lands = na.omit(landings) #removes NA lines
colnames(lands) = c("Name","ID","NameType","Class","Mass","Fall","Year","Latitude","Longitude", "GeoLocation")
lands$Class = as.factor(lands$Class)
lands$Fall = as.factor(lands$Fall)
#lands$Year = as.factor(lands$Year)
lands$NameType = as.factor(lands$NameType)
summary(lands)## Name ID NameType Class
## Length:38115 Min. : 1 Relict: 21 L6 : 7519
## Class :character 1st Qu.:10832 Valid :38094 H5 : 6243
## Mode :character Median :21732 H6 : 3898
## Mean :25343 H4 : 3880
## 3rd Qu.:39888 L5 : 3264
## Max. :57458 LL5 : 2199
## (Other):11112
## Mass Fall Year Latitude
## Min. : 0 Fell : 1065 Min. : 860 Min. :-87.37
## 1st Qu.: 7 Found:37050 1st Qu.:1986 1st Qu.:-76.72
## Median : 29 Median :1996 Median :-71.50
## Mean : 15601 Mean :1990 Mean :-39.60
## 3rd Qu.: 187 3rd Qu.:2002 3rd Qu.: 0.00
## Max. :60000000 Max. :2101 Max. : 81.17
##
## Longitude GeoLocation
## Min. :-165.43 Length:38115
## 1st Qu.: 0.00 Class :character
## Median : 35.67 Mode :character
## Mean : 61.31
## 3rd Qu.: 157.17
## Max. : 178.20
##
First 5 meteorites from the data set witht their respected data fields.
head(lands)## Name ID NameType Class Mass Fall Year Latitude Longitude
## 1 Aachen 1 Valid L5 21 Fell 1880 50.77500 6.08333
## 2 Aarhus 2 Valid H6 720 Fell 1951 56.18333 10.23333
## 3 Abee 6 Valid EH4 107000 Fell 1952 54.21667 -113.00000
## 4 Acapulco 10 Valid Acapulcoite 1914 Fell 1976 16.88333 -99.90000
## 5 Achiras 370 Valid L6 780 Fell 1902 -33.16667 -64.95000
## 6 Adhi Kot 379 Valid EH4 4239 Fell 1919 32.10000 71.80000
## GeoLocation
## 1 (50.775, 6.08333)
## 2 (56.18333, 10.23333)
## 3 (54.21667, -113.0)
## 4 (16.88333, -99.9)
## 5 (-33.16667, -64.95)
## 6 (32.1, 71.8)
Structure of each field within the data set.
str(lands)## 'data.frame': 38115 obs. of 10 variables:
## $ Name : chr "Aachen" "Aarhus" "Abee" "Acapulco" ...
## $ ID : int 1 2 6 10 370 379 390 392 398 417 ...
## $ NameType : Factor w/ 2 levels "Relict","Valid": 2 2 2 2 2 2 2 2 2 2 ...
## $ Class : Factor w/ 422 levels "Acapulcoite",..: 307 182 78 1 313 78 328 175 313 223 ...
## $ Mass : num 21 720 107000 1914 780 ...
## $ Fall : Factor w/ 2 levels "Fell","Found": 1 1 1 1 1 1 1 1 1 1 ...
## $ Year : int 1880 1951 1952 1976 1902 1919 1949 1814 1930 1920 ...
## $ Latitude : num 50.8 56.2 54.2 16.9 -33.2 ...
## $ Longitude : num 6.08 10.23 -113 -99.9 -64.95 ...
## $ GeoLocation: chr "(50.775, 6.08333)" "(56.18333, 10.23333)" "(54.21667, -113.0)" "(16.88333, -99.9)" ...
## - attr(*, "na.action")= 'omit' Named int [1:7601] 13 38 39 77 94 148 173 205 209 263 ...
## ..- attr(*, "names")= chr [1:7601] "13" "38" "39" "77" ...
Meteorite with the highest weight(g). Hoba is the current largest meteorite found, impacting around 80,000 years ago in Africa.
lands[which.max(lands$Mass),]## Name ID NameType Class Mass Fall Year Latitude Longitude
## 16393 Hoba 11890 Valid Iron, IVB 60000000 Found 1920 -19.58333 17.91667
## GeoLocation
## 16393 (-19.58333, 17.91667)
In this chart I plotted Latitude vs. Longitude where the size of each dot is dependent on the mass, while color is determined by whether the meteorite was found or not.
qplot(data=lands, x = Longitude, y= Latitude, size=Mass, color=Fall)+ theme_solarized() +ggtitle("World-Wide Longitude vs. Latitude")This chart filters the Latitude and Longitude to show the United States. More meteorites were found in the midwest.
US = lands[lands$Longitude <= -50,]
US = US[US$Latitude >=0,]
qplot(data=US, x = Longitude, y= Latitude, size=Mass, color=Fall)+ theme_solarized()+ ggtitle("U.S.A. Longitude vs. Latitude")Meteorites discovered by Year vs. Mass where color is the class (legend hidden). Most meteorites were uncovered within the last 20 years or so.
qplot(data = lands, x = Mass, y = Year, geom = "point",color = Class,show.legend = FALSE )+xlim(0,4000)+ylim(1600,2050)+ggtitle("Year vs. Mass")+ theme_solarized()Histogram chart plotting Years found from 1850-1950, where color is whether or not if meteorite was found.
ggplot(lands, aes(Year, fill = Fall)) +
geom_histogram(bins = 30,col=I("black")) + xlim(1850,1950)+ ggtitle("1850-1950 Histogram")+ theme_solarized() + ylab("Frequency")Frequency of meteorites with the NameType of Relict, using freqpoly plot.
pops = lands[lands$NameType == "Relict",]
ggplot(pops, aes(x = Year, color = Class)) +geom_freqpoly(binwidth=2, size = 2) + xlim(1970, 2015) + theme_solarized()+ggtitle("NameType: Relict") + ylab("Frequency")This linear regression chart shows class L6 being filtered. Mass is limited from 0-1000g, while Year is limited from 1990-2000. You can see that most of the metorites are smaller with more being found in recent years.
L6Class = lands[lands$Class == "L6",]
test = L6Class[L6Class$Year >= 1990 & L6Class$Year <= 2000,]
test = test[test$Mass <= 1000,]
x = test$Year
y = test$Mass
lr <- lm(y~x)
plot(x,y, main = "Linear Regression: Mass vs Year of L6", xlab = "Year: 1990-2000", ylab = "Mass: 0-1000g")
points( x, lr$coefficients[1] + lr$coefficients[2] * x, type="l", col=4)This facet chart showcases the Mass vs. Year of the 6 highest count classes. Year is limited from 1950-2000, and Mass from 0-2000g. Each class is filtered into its own dataframe then combined. You can see that class LL5 meteorites are found to be smaller than the other classes.
H5Class = lands[lands$Class == "H5",]
L5Class = lands[lands$Class == "L5",]
H6Class = lands[lands$Class == "H6",]
H4Class = lands[lands$Class == "H4",]
LL5Class = lands[lands$Class == "LL5",]
Combo = rbind(H5Class, L6Class, L5Class, H6Class, H4Class, LL5Class)
#summary(test)
ggplot(data=Combo, aes(x=Year, y=Mass, color=Class)) + xlim(1950, 2000) + ylim(0,2000) + geom_point(size=2) + facet_grid(Class~.) + theme_solarized() + ggtitle("Class Comparison")K-Means: from this graph I concluded that 3 clusters would be ideal for the latitude and longitude clustering chart.
mat <- cbind( lands$Longitude, lands$Latitude)
mat = na.omit(mat)
clust = lands
wss <- rep(0,15)
for (k in 1:15)
wss[k] <- sum( kmeans(mat,centers=k, nstart=50)$withinss)
plot(wss, type="b", main = "K-Means", xlab = "Index", ylab = "WSS" ) From this point chart, we can see that the clusters are separated by larger land masses.
km = kmeans(mat,centers=3)$cluster
clust$cl <- factor( km)
qplot(data=clust, x=Longitude,y=Latitude, color=cl)+ theme_solarized() + ggtitle("Clustering: Latitude vs. Longitude")Instead of using total data, these 3D ScatterPlots use Class H5 where the years are over 1900. Most meteorites are found in recent years.
smol = lands[lands$Year >= "1900",]
smol = smol[smol$Class == "H5",]
scatter3D(smol$Longitude,smol$Latitude,smol$Year,
main="Latitude vs. Longitude vs. Year",
xlab = "Longitude",
ylab = "Latitude",
zlab = "Year")mats <- cbind(smol$Longitude,smol$Latitude,smol$Year, col=NULL)
km = kmeans(mats,centers=3)$cluster
smol$cl <- km
scatter3D(smol$Longitude,smol$Latitude,smol$Year, colvar=smol$cl,iris[,1:3],
main="Latitude vs. Longitude vs. Year",
xlab = "Longitude",
ylab = "Latitude",
zlab = "Year")